Useful statistics for corpus linguistics

نویسنده

  • Stefan Th. Gries
چکیده

• frequencies of occurrence of linguistic elements, which can be studied from two different perspectives: o how frequent are morphemes or words or patterns/constructions in (parts of) a corpus? This information can be provided in various different forms of frequency lists; o how evenly are morphemes or words or patterns/constructions distributed across (parts of) a corpus? This information can be provided in the form of various dispersion statistics; • frequencies of co-occurrence: how often do linguistic elements such as morphemes, words, patterns/construction co-occur with another linguistic element from this set or a position in a text.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Do We Need Discipline-Specific Academic Word Lists? Linguistics Academic Word List (LAWL)

This corpus-based study aimed at exploring the most frequently-used academic words in linguistics and compare the wordlist with the distribution of high frequency words in Coxhead’s Academic Word List (AWL) and West’s General Service List (GSL) to examine their coverage within the linguistics corpus. To this end, a corpus of 700 linguistics research articles (LRAC), consisting of approximately ...

متن کامل

Automatic Processing of Large Corpora for the Resolution of Anaphora References

Manual acquisition of semantic constraints in broad domains is very expensive. This paper presents an automatic scheme for collecting statistics on cooccurrence patterns in a large corpus. To a large extent, these statistics reflect, semantic constraints and thus are used to disambiguate anaphora references and syntactic ambiguities. The scherne was implemented by gathering statistics on the ou...

متن کامل

Online statistics labs

Recent publications in the field of corpus linguistics (including several in this and the previous issue of CLLT) strongly indicate that the field is on its way from a view of corpora as mere repositories of authentic data from which examples can be culled ad libitum to a methodology that analyzes linguistic phenomena systematically and exhaustively as they manifest themselves in corpus data. T...

متن کامل

Extracting Syntax Statistics from Large Corpora of Written English

The field of linguistics has seen a growing interest in the statistics of everyday language. In studying how we acquire language and why some of its aspects are more difficult for us than others, it is critical to understand the linguistic environment to which we are exposed. However, gathering statistics over syntactic structures, even with a syntactically tagged corpus, can be difficult and t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009